The Evolution of Linear Models in SAS: A Personal Perspective
نویسنده
چکیده
Phenomenal growth in computational power from 1970 through 2010 enabled a parallel expansion in linear model methodology. From humble beginnings in agriculture, linear model applications are now essential in sciences of genetics, education, and biostatistics, to name a few. Indeed, the meaning of "linear models" has evolved accordingly. Developers at SAS Institute have been in the forefront of invention and implementation of these methods at the core of statistical science. Pathways will be traced in steps of SAS procedures, beginning with GLM and REG, proceeding through VARCOMP, NLIN, MIXED and GENMOD, and arriving at NLMIXED and GLIMMIX. Along the way, some problems have disappeared, new ones have emerged, and others are still along for the ride. INTRODUCTION The purpose of this paper is to chronicle the evolution of linear models in SAS from the perspective of an outsider who has closely followed the progression and whose professional career was influenced by it. Linear models have been in the core of statistical methodology and SAS procedures followed that pattern. The year 1976 can be considered the birth date of SAS as we now recognize it. SAS·76 was the first release of SAS Incorporated. So one may think of time since 1976 as the Common Era of SAS. The hallmark statistical procedure in SAS·76 was GLM. It was highly innovative for its time and caught attention of statisticians and others engaged in data analysis across the US and beyond. GLM established a pattern for statistical procedures in SAS. Instead of a large number of special purpose linear model applications, GLM provided a comprehensive platform that enabled a user to obtain solutions for most problems falling in the arena of linear models; for regression analysis, analysis of variance and covariance, and multivariate analysis. Whereas most of the capabilities of GLM were inspired by statisticians working in agriculture research, GLM became the workhorse procedure for pharmaceutical statisticians and biostatisticians. A few years later the REG procedure was released. It expanded regression capabilities to include diagnostic techniques that had been the subject of active research, and recently published in a major text book by Belsley, Kuh and Welsch (1980). Now the user not only had the capability to compute inferential statistics in regression analysis, but could also obtain statistics to help decide what variables to include in the analysis and to identify problematic data. The VARCOMP procedure provided estimates of variance components in mixed linear models, giving the user four choices of methods of estimation that have also been incorporated into later SAS procedures. This procedure, like GLM, brought forth computing machinery that opened the door to evaluation and comparison of statistical methods which were previously infeasible. The NLIN procedure, although not really intended for linear models, permitted the formulation of models with linear components, such as segmented polynomials, as nonlinear models. Capabilities for analysis of categorical data were limited in early versions of SAS. They were enhanced by the CATMOD and GENMOD procedures. CATMOD was based on methodology of Grizzle, Starmer and Koch (1969) that innovated using linear models for categorical data analysis. A later procedure GENMOD was based on generalized linear models introduced by Nelder and Wedderburn (1972). During the 1980’s GLM added useful enhancements, but was nagged by the need for features to adequately accommodate problems related to analysis of correlated data. The immensity of this need inspired the development of the MIXED procedure. Now data with random effects and repeated measures could be analyzed by incorporating those features into the statistical model for the data. Whereas GLM was built around the model for the expected value of the response variable taking all independent variables as fixed, MIXED is built around models for both the expected value of the response as a function only of the fixed variables, and the variance of random effects. This turned the tables in the relation between statistical methodology and its computational implementation. MIXED revealed the need for further development of methods to adjust for the effects of using variance estimates in place of true variances Shortly following MIXED, macros were provided for fitting nonlinear mixed models and generalized linear mixed models using MIXED to make iterative computations. These macros later evolved into the procedures NLMIXED and GLIMMIX. The GLIMMIX procedure extends the capabilities of GLM and MIXED to generalized linear models. Statistics and Data Analysis SAS Global Forum 2011
منابع مشابه
SECURING INTERPRETABILITY OF FUZZY MODELS FOR MODELING NONLINEAR MIMO SYSTEMS USING A HYBRID OF EVOLUTIONARY ALGORITHMS
In this study, a Multi-Objective Genetic Algorithm (MOGA) is utilized to extract interpretable and compact fuzzy rule bases for modeling nonlinear Multi-input Multi-output (MIMO) systems. In the process of non- linear system identi cation, structure selection, parameter estimation, model performance and model validation are important objectives. Furthermore, se- curing low-level and high-level ...
متن کاملSAS Software to Fit the Generalized Linear Model
In recent years, the class of generalized linear models has gained popularity as a statistical modeling tool. This popularity is due in part to the flexibility of generalized linear models in addressing a variety of statistical problems and to the availability of software to fit the models. The SAS system provides two new tools that fit generalized linear models. The GENMOD procedure in SAS/ST...
متن کاملConceptual and numerical models of the evolution of pedogenic carbonates in soils of arid and semi-arid regions: A review
Introduction Calcareous soils are widely distributed in arid and semi-arid regions of the world and the presence of carbonates in the soils affects both physicochemical properties and the pedogenic evolution. In addition, soil carbon plays a critical role in the global carbon cycle, and pedogenic carbonates are an important sink for atmospheric carbon. Pedogenic carbonates are also the proxy o...
متن کاملObstetricians' perspective on the Health Section Evolution Plan in Iran: A Quality-Case Study
Background and Objectives: The Increase of unnecessary caesarean sections has become one of the serious concerns in some health systems. One of the seven packages of the health Reform Plan that was sent to all Iranian medical universities in 2014 was the "Promoting Natural Delivery (vaginal births)," It emphasized the need to reduce cesarean delivery and promot...
متن کاملبررسی ویژگیهای سازمان یادگیرنده در دانشگاه علوم پزشکی زنجان از دیدگاه مدیران و کارکنان
Introduction: The purpose of this study was to assess the amount of compatibility of organizational properties and principles in Zanjan Univsersity of Medical Sciences with the established standards from both the staff and managers’ point of view. Peter Senge Assessment Characteristics of Learning Organization (personal mastery, shared vision, mental models, team learning and system approac...
متن کامل